Longest Common Prefixes with k-Mismatches & Applications

نویسندگان

  • Hayam Alamro
  • Lorraine A.K. Ayad
  • Panagiotis Charalampopoulos
  • Costas S. Iliopoulos
  • Solon P. Pissis
چکیده

We propose a new algorithm for computing the longest prefix of each suffix of a given string of length n over a constant-sized alphabet of size σ that occurs elsewhere in the string with Hamming distance at most k. Specifically, we show that the proposed algorithm requires time O(n(σR) log log n(log k + log logn)) on average, where R = d(k + 2)(logσ n + 1)e, and space O(n). This improves upon the state-of-theart average-case time complexity for the case when k = 1 [Manzini, SPIRE 2015] by a factor of logn/ log logn. In addition, we show how the proposed technique can be adapted and applied in order to compute the longest previous factors under the Hamming distance model within the same complexities. In terms of real-world applications, we show that our technique can be directly applied to the problem of genome mappability.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Longest Common Substring with Approximately k Mismatches

In the longest common substring problem we are given two strings of length n and must find a substring of maximal length that occurs in both strings. It is well-known that the problem can be solved in linear time, but the solution is not robust and can vary greatly when the input strings are changed even by one letter. To circumvent this, Leimeister and Morgenstern introduced the problem of the...

متن کامل

Longest common substrings with k mismatches

The longest common substring with k-mismatches problem is to find, given two strings S1 and S2, a longest substring A1 of S1 and A2 of S2 such that the Hamming distance between A1 and A2 is ≤ k. We introduce a practical O(nm) time and O(1) space solution for this problem, where n and m are the length of S1 and S2, respectively. This algorithm can also be used to compute the matching statistics ...

متن کامل

ar X iv : 1 40 9 . 16 94 v 2 [ cs . D S ] 1 6 M ar 2 01 5 Longest common substrings with k mismatches

The longest common substring with k-mismatches problem is to find, given two strings S1 and S2, a longest substring A1 of S1 and A2 of S2 such that the Hamming distance between A1 and A2 is ≤ k. We introduce a practical O(nm) time and O(1) space solution for this problem, where n and m are the lengths of S1 and S2, respectively. This algorithm can also be used to compute the matching statistics...

متن کامل

Quantifying the Pitfalls of Traceroute in AS Connectivity Inference

Although traceroute has the potential to discover AS links that are invisible to existing BGP monitors, it is well known that the common approach for mapping router IP address to AS number (IP2AS) based on the longest prefix matching is highly error-prone. In this paper we conduct a systematic investigation into the potential errors of the IP2AS mapping for AS topology inference. In comparing t...

متن کامل

Computing the Longest Common Prefix of a Context-free Language in Polynomial Time

We present two structural results concerning longest common prefixes of non-empty languages. First, we show that the longest common prefix of the language generated by a context-free grammar of size N equals the longest common prefix of the same grammar where the heights of the derivation trees are bounded by 4N . Second, we show that each non-empty language L has a representative subset of at ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017